Skip to content

Conversation

@tcard
Copy link
Contributor

@tcard tcard commented Dec 26, 2025

What this PR does

storage/ingest#parallelStorageShards currently shrugs off lifetime management of the WriteRequest's underlying buffer. Although this doesn't really uphold the contract of mimirpb.WriteRequest, at the moment this doesn't matter in practice, because we don't pool those buffers in ingest storage, and thus we can leave the GC to figure out lifetimes.

However, this is a blocker for #13609, in which we introduce manual memory management for a fraction of WriteRequests. We need the WriteRequest's mem.Buffer's reference counts to eventually reach zero, so we release back their memory.

This would also become a problem if we introduce pooling at some point.

After this change, parallelStorageShards properly transfers ownership of those buffers from the original WriteRequests downstream, by increasing and decreasing each buffer's reference count accordingly.

This is a bit involved, as each shard constructs "synthetic" WriteRequests that incorporates references to multiple buffers, and also, each buffer can be referenced by multiple of those synthetic WriteRequests. To track this shared ownership, WriteRequest.sourceBufferHolders has been added.

Which issue(s) this PR fixes or relates to

Spun off from #13609

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Strengthens memory management for protobuf buffers in ingest and tests to prevent leaks and use-after-free.

  • WriteRequest buffer ownership: Add WriteRequest.sourceBufferHolders, AddSourceBufferHolder(), and FreeBuffer() to retain and release strong refs to source mem.Buffers; expose BufferHolder.Buffer().
  • Ingest path changes: In PushToStorageAndReleaseRequest, defer request.FreeBuffer() and pass &request.BufferHolder into batching. batchingQueue.AddToBatch()/AddMetadataToBatch() now accept a *BufferHolder and call AddSourceBufferHolder() so synthetic batches keep source buffers alive.
  • Tests updated: Frontend protobuf tests now obtain msg.Buffer(), defer buf.Free(), and msg.SetBuffer(nil). New mimirpb/testutil instruments buffer refcounts; pusher tests compare marshaled requests, ensure no buffer leaks, and use new AddToBatch signatures. Reflect-based tests include the new sourceBufferHolders field.

Written by Cursor Bugbot for commit 8296fec. This will update automatically on new commits. Configure here.

@tcard tcard added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Dec 26, 2025
@tcard tcard requested a review from a team as a code owner December 26, 2025 16:35
msg, err := resp.Next(ctx)
require.NoError(t, err)
msg.FreeBuffer() // We don't care about the contents of the buffer in the assertion below.
defer msg.Buffer().Free()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file aren't strictly needed (but they are if we incorporate #13609), but again, it's about upholding the contract: msg's buffer should no longer be referenced once msg.FreeBuffer is called, but the assertion below references it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand these additions to this test, but I just wanted to make sure your intention matches the obscure way defer works with chained method calls. (That Buffer() is called NOW, and Free() is called when the scope exits.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but let's make that clear: 33a1570

@tcard tcard force-pushed the storage-shards-ref-count branch from b418c41 to aef907b Compare December 26, 2025 16:43
Copy link
Contributor

@seizethedave seizethedave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a couple of questions.

}

func (b *memBufferWithInstrumentedRefCount) Free() {
b.refCount.Sub(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about panicking if this ever goes negative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msg, err := resp.Next(ctx)
require.NoError(t, err)
msg.FreeBuffer() // We don't care about the contents of the buffer in the assertion below.
defer msg.Buffer().Free()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand these additions to this test, but I just wanted to make sure your intention matches the obscure way defer works with chained method calls. (That Buffer() is called NOW, and Free() is called when the scope exits.)

buf = mem.SliceBuffer([]byte("fake data"))
}
ibuf := &memBufferWithInstrumentedRefCount{Buffer: buf}
ibuf.refCount.Add(1) // Match the refCount of buf.Buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// Match the refCount of buf.Buffer" -- is this necessarily true? If this is test support code, maybe we need to assert it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't, there's no method on mem.Buffer to check the current ref count.

@tcard tcard merged commit c1a52cf into main Jan 9, 2026
39 checks passed
@tcard tcard deleted the storage-shards-ref-count branch January 9, 2026 12:24
tcard added a commit that referenced this pull request Jan 9, 2026
#### What this PR does

Adds a `common.instrument-reference-leaks-pct` flag. When set to >0, a
percentage of gRPC BufferHolder objects (most prominently, WriteRequest)
will be instrumented so that, once the buffer's reference count reaches
zero, if there are remaining references to the buffer, a panic is
thrown.

For this to work, either the object needs to come from a generated gRPC
server using our custom global codec, or the newly introduced
`mimirpb.Unmarshal` must be used.

Since ingester storage doesn't receive WriteRequests from gRPC anymore
but from Kafka, `storage/ingest` now calls `mimirpb.Unmarshal`.

To-do: plumb this mechanism with the distributor too, which has its own
unmarshaling path.

#### Which issue(s) this PR fixes or relates to

Follow up of #13573

Builds on top of #13888

#### Checklist

- [x] Tests updated.
- [ ] Documentation added.
- [x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [x]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces experimental instrumentation to detect leaked references to
pooled gRPC buffers and integrates it across config, codec, and ingest
paths.
> 
> - New global gRPC codec with leak-tracking and `mimirpb.Unmarshal()`;
`storage/ingest` now uses `mimirpb.Unmarshal`, and
`TSDBBuilder.PushToStorageAndReleaseRequest` defers `req.FreeBuffer()`
> - Adds experimental `common.instrument-reference-leaks.*` config
(percentage, before_reuse_period, max_inflight_instrumented_bytes) with
validation, inheritance, defaults, help, and docs
> - Wires config into server startup; updates config inheritance
machinery; adds focused tests for leak detection, RW2 ingest, and config
inheritance; updates CHANGELOG
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
79c2ea0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Taylor C <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-not-needed PRs that don't need a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants